2023-08 August Reads

Tags
Created by
Marc Leprince
Created time
Aug 9, 2023 3:24 PM
Last edited time
Aug 16, 2023 8:22 PM
Medium Post
Published Date
  • Are Software companies worth it? - from Jamin Ball’s Clouded Judgement Blog
    • Very interesting read from a financial view of tech companies, what they are worth and the multiples they deserve based on stock prices, etc…
    • Essentially - hitting a desirable cashflow generation target is very hard to do, and because many tech companies won’t - there’s reason to believe many of them are over valued.
  • Strategies & Techniques for Debugging SAS program Errors & Warnings - 2013 WUSS White paper
    • covers a good framework for categorizing errors - compilation errors (syntax, semantic (misspellings), macro) or runtime (execution time (data not sorted…), data, or macro)
    • Also contains useful datastep options for debugging - dsnferr: stop processing when data does not exist
  • Data Architecture Evolution - Blog from Diogo Santos
    • Data Warehouse as first basic architecture - SQL is the foundation: traditional source, etl to target with change data capture processes, then serve up in reports/visualizations
      • Pain points: ETL, jobs & tables spread quickly into disarray without strict controls, and doesn’t allow for big data or unstructured. On/offboarding sources is a pain. Schema on write
    • Data Lakes were created to help data scientists with model training processes
      • need access to large variety & volume of data
      • often use HDFS & other parallel procesing fameworks
      • Uses ELT (schema on read)
      • After Transform - it feeds DW or feature stores
      • Can contain “layers” of cleansed data, raw, bronze, silver, gold…
      • also offloads the T in ELT to data science teams away from an IT resp
      • But data lake can quickly turn into a swamp if it isn’t cleaned and maintained properly
      • tough to track sources of datasets…
    • Cloud data lake
      • RT structures like Kafka emerge & cloud tech makes integrating tech easier & cheaper
      • to offer all this functionality, there is a lot of complexity behind the scenes, & can take a long time to produce sharable data
    • Data Mesh Architecture
      • Newest approach that follows uses microservices solution to monoliths applied to data architecture
      • data is decentralized & owners of data are spread across domains that are resp for modeling, storage, governance & architecture…
      • Domains - “microservice” of people that are reponsible for data products, infra, govn, and Mesh API
      • It becomes a mesh when all these data components feed others and it is all trackable
  • Linux Tips from Paul Brown
    • Excellent and useful tips and explanations of how the . [] {} <> and other sorts of characters are used throughout bash commands and how to read and use them.
  • BCBS Progress report (2019) - Deloitte Commentary on BCBS BIS report for GSIB banks
    • Focuses on governance and infrastructure, risk & data aggregation capabilities, and risk reporting practices —> it’s clear regulators standards have increased for quality & availability of data
    • Governance - establishing clear ownership and accountability for data with indp units for validating data quality and data management (CDO setting data quality standards)
    • Data architecture & IT Infra - lack of compliance and need to invest resources in talent and tools to support the requirements
    • Accuracy & integrity, timeliness, adaptability of data are all things that are dependent on data architecture & IT infra
    • banks should develop a comprehensive strategy, with gap analysis and roadmaps to remediation with clear issues they are trying to resolve
    • replace EUCs with more integrated reporting platforms
    • upgrade data dictionaries
    • Need to be able to understand the transformations that occur from datasource to a report in production, and effectiveness of business process
    • refine CDE - identify CDEs that are key to the org and at 2 levels (corp & business), can be used to prioritize assignment of controls and review
    • data lineage with clear ownership model throughout data lifecycle